A Model for the Representation and Focussed Retrieval of Structured Documents Based on Fuzzy Aggregation
نویسندگان
چکیده
Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, suitable representation methods must be developed. This paper presents a model for representing structured documents to allow for their focussed retrieval. The model is founded on fuzzy aggregation, an approach based on the fuzzy representation of linguistic quantifiers and ordered weighted averaging operators. By defining the representation of a document component as the fuzzy aggregation of its related components, we arrive at a document representation that supports the selection of best entry points.
منابع مشابه
Structured document retrieval using Dempster-Shafer's Theory of Evidence: Implementation and evaluation
Documents are often structured, for example into chapters, each with sections, and so on. The representation of structured documents should provide for a focussed retrieval of those components, individual or aggregated, of the documents that are most relevant to an information need. In previous work, we developed a model for the representation and the retrieval of structured documents. This mod...
متن کاملAggregated Representation for the Focussed Retrieval of Structured Documents
Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, the representation of a document component is defined ...
متن کاملAggregation-Based Structured Text Retrieval
DEFINITION Text retrieval is concerned with the retrieval of documents in response to user queries. This is achieved by (i) representing documents and queries with indexing features that provide a characterisation of their information content, and (ii) defining a function that uses these representations to perform retrieval. Structured text retrieval introduces a finer-grained retrieval paradig...
متن کاملSearching XML Documents - Preliminary Work
Structured document retrieval aims at exploiting the structure together with the content of documents to improve retrieval results. Several aspects of traditional information retrieval applied on flat documents have to be reconsidered. These include in particular, document representation, storage, indexing, retrieval, and ranking. This paper outlines the architecture of our system and the adapt...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کامل